765 research outputs found
A double classification tree search algorithm for index SNP selection
BACKGROUND: In population-based studies, it is generally recognized that single nucleotide polymorphism (SNP) markers are not independent. Rather, they are carried by haplotypes, groups of SNPs that tend to be coinherited. It is thus possible to choose a much smaller number of SNPs to use as indices for identifying haplotypes or haplotype blocks in genetic association studies. We refer to these characteristic SNPs as index SNPs. In order to reduce costs and work, a minimum number of index SNPs that can distinguish all SNP and haplotype patterns should be chosen. Unfortunately, this is an NP-complete problem, requiring brute force algorithms that are not feasible for large data sets. RESULTS: We have developed a double classification tree search algorithm to generate index SNPs that can distinguish all SNP and haplotype patterns. This algorithm runs very rapidly and generates very good, though not necessarily minimum, sets of index SNPs, as is to be expected for such NP-complete problems. CONCLUSIONS: A new algorithm for index SNP selection has been developed. A webserver for index SNP selection is available a
Verifying Data Constraint Equivalence in FinTech Systems
Data constraints are widely used in FinTech systems for monitoring data
consistency and diagnosing anomalous data manipulations. However, many
equivalent data constraints are created redundantly during the development
cycle, slowing down the FinTech systems and causing unnecessary alerts. We
present EqDAC, an efficient decision procedure to determine the data constraint
equivalence. We first propose the symbolic representation for semantic encoding
and then introduce two light-weighted analyses to refute and prove the
equivalence, respectively, which are proved to achieve in polynomial time. We
evaluate EqDAC upon 30,801 data constraints in a FinTech system. It is shown
that EqDAC detects 11,538 equivalent data constraints in three hours. It also
supports efficient equivalence searching with an average time cost of 1.22
seconds, enabling the system to check new data constraints upon submission.Comment: 14 pages, 11 figures, accepted by ICSE 202
Gene functional similarity search tool (GFSST)
BACKGROUND: With the completion of the genome sequences of human, mouse, and other species and the advent of high throughput functional genomic research technologies such as biomicroarray chips, more and more genes and their products have been discovered and their functions have begun to be understood. Increasing amounts of data about genes, gene products and their functions have been stored in databases. To facilitate selection of candidate genes for gene-disease research, genetic association studies, biomarker and drug target selection, and animal models of human diseases, it is essential to have search engines that can retrieve genes by their functions from proteome databases. In recent years, the development of Gene Ontology (GO) has established structured, controlled vocabularies describing gene functions, which makes it possible to develop novel tools to search genes by functional similarity. RESULTS: By using a statistical model to measure the functional similarity of genes based on the Gene Ontology directed acyclic graph, we developed a novel Gene Functional Similarity Search Tool (GFSST) to identify genes with related functions from annotated proteome databases. This search engine lets users design their search targets by gene functions. CONCLUSION: An implementation of GFSST which works on the UniProt (Universal Protein Resource) for the human and mouse proteomes is available at GFSST Web Server. GFSST provides functions not only for similar gene retrieval but also for gene search by one or more GO terms. This represents a powerful new approach for selecting similar genes and gene products from proteome databases according to their functions
Synthesizing Conjunctive Queries for Code Search
This paper presents Squid, a new conjunctive query synthesis algorithm for searching code with target patterns. Given positive and negative examples along with a natural language description, Squid analyzes the relations derived from the examples by a Datalog-based program analyzer and synthesizes a conjunctive query expressing the search intent. The synthesized query can be further used to search for desired grammatical constructs in the editor. To achieve high efficiency, we prune the huge search space by removing unnecessary relations and enumerating query candidates via refinement. We also introduce two quantitative metrics for query prioritization to select the queries from multiple candidates, yielding desired queries for code search. We have evaluated Squid on over thirty code search tasks. It is shown that Squid successfully synthesizes the conjunctive queries for all the tasks, taking only 2.56 seconds on average
Endoluminal Motion Recognition of a Magnetically-Guided Capsule Endoscope Based on Capsule-Tissue Interaction Force
A magnetically-guided capsule endoscope, embedding flexible force sensors, is designed to measure the capsule-tissue interaction force. The flexible force sensor is composed of eight force-sensitive elements surrounding the internal permanent magnet (IPM). The control of interaction force acting on the intestinal wall can reduce patient's discomfort and maintain the magnetic coupling between the external permanent magnet (EPM) and the IPM during capsule navigation. A flexible force sensor can achieve this control. In particular, by analyzing the signals of the force sensitive elements, we propose a method to recognize the status of the motion of the magnetic capsule, and provide corresponding formulas to evaluate whether the magnetic capsule follows the motion of the external driving magnet. Accuracy of the motion recognition in Ex Vivo tests reached 94% when the EPM was translated along the longitudinal axis. In addition, a method is proposed to realign the EPM and the IPM before the loss of their magnetic coupling. Its translational error, rotational error, and runtime are 7.04 ± 0.71 mm, 3.13 ± 0.47∘, and 11.4 ± 0.39 s, respectively. Finally, a control strategy is proposed to prevent the magnetic capsule endoscope from losing control during the magnetically-guided capsule colonoscopy
Photometric Stereo-Based Depth Map Reconstruction for Monocular Capsule Endoscopy
The capsule endoscopy robot can only use monocular vision due to the dimensional limit. To improve the depth perception of the monocular capsule endoscopy robot, this paper proposes a photometric stereo-based depth map reconstruction method. First, based on the characteristics of the capsule endoscopy robot system, a photometric stereo framework is established. Then, by combining the specular property and Lambertian property of the object surface, the depth of the specular highlight point is estimated, and the depth map of the whole object surface is reconstructed by a forward upwind scheme. To evaluate the precision of the depth estimation of the specular highlight region and the depth map reconstruction of the object surface, simulations and experiments are implemented with synthetic images and pig colon tissue, respectively. The results of the simulations and experiments show that the proposed method provides good precision for depth map reconstruction in monocular capsule endoscopy
Recommended from our members
Optimal Step Length EM Algorithm (OSLEM) for the estimation of haplotype frequency and its application in lipoprotein lipase genotyping
Background: Haplotype based linkage disequilibrium (LD) mapping has become a powerful and cost-effective method for performing genetic association studies, particularly in the search for genetic markers in linkage disequilibrium with complex disease loci. Various methods (e.g. Monte-Carlo (Gibbs sampling); EM (expectation maximization); and Clark's method) have been used to estimate haplotype frequencies from routine genotyping data. Results: These algorithms can be very slow for large number of SNPs. In order to speed them up, we have developed a new algorithm using numerical analysis technology, a so-called optimal step length EM (OSLEM) that accelerates the calculation. By optimizing approximately the step length of the EM algorithm, OSLEM can run at about twice the speed of EM. This algorithm has been used for lipoprotein lipase (LPL) genotyping analysis. Conclusions: This new optimal step length EM (OSLEM) algorithm can accelerate the calculation for haplotype frequency estimation for genotyping data without pedigree information. An OSLEM on-line server is available, as well as a free downloadable version
- …